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Abstract 

We introduce new fast canonical local algorithms for discrete and continuous spin systems. We 
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I. INTRODUCTION 



Spin systems are one of the most studied subjects in physics for their own interest but 
also because many other problems can be mapped on them. Since few exact calculations are 
available, numerical Monte Carlo simulations are extensively used to study them. One of the 
most popular method is the Metropolis algorithmfl], rightly because of its simplicity and its 
general applicability. The Metropolis algorithm is the easiest one to implement since it uses 
no prior informations and automatically matches the properties of the systems. However, 
using more of the available informations one should obtain algorithms better suited to spin 
systems. The algorithms proposed in this article follow this strategy at the price that they 
are slightly more complicated. To help implementations all programs discussed in this article 
are accessible at our homepage [2]. 

The algorithms studied in this article use only local updates. The non-local cluster 
algorithms are certainly better suited to study ferromagnetic second order phase transitions 
but cannot be used for frustrated spin systems and for first order transitions. Moreover, 
even when available, cluster algorithms should be used in combination with local algorithm 
[3]. The gain in efficiency quoted in the present article must be understood as the gain for 
the part corresponding to the local algorithm alone. Further we have checked the influence 
on the gain of other algorithms like the exchange algorithm [4] , the over-relaxation algorithm 
[5] or the multicanonical algorithm [6]. We think that, if the multi-spins coding cannot be 
implemented, the methods introduced in this article are the fastest possible local updates 
and no real improvements could be made. If the multi-spins coding can be implemented it 
should be used because the gain will be better. However, this method is less general than 
the methods introduced in this article and moreover exist only for discrete spins. 

Simulations to test the algorithms have been done for several lattices with frustrated 
or non frustrated spin systems, using small lattice sizes L. The examples studied are the 
two-dimensional ferromagnetic square lattice (2s), the two-dimensional antiferromagnetic 
triangular lattice (2t), the three-dimensional antiferromagnetic stacked triangular lattice 
(3t), the three-dimensional ferromagnetic and anti-ferromagnetic cubic lattice (3c & 3ca), 
and the four- dimensional spin glasses on a cubic lattice with random ±J interaction (4sg). 

The comparative efficiency of the algorithm studied does not depend on the size, due to 
the local nature of the updates. For a second order transition the integrated autocorrelation 



2 



time (see for example the appendix of [7]) follows the same law r = r ■ L z with z ~ 2 and 
only the value of To depends on the algorithm used. In a strong first order transition r has 
an exponential form for a local algorithm but the multicanonical algorithm in combination 
gives a power law behavior with a different z [6]. Also, because algorithms are local, the 
type of lattices and interactions do not have a strong influence on the comparative efficiency 
of the algorithms. The important parameter is the temperature or more precisely the local 
field, that is the sum of the neighboring spins associated with interactions divided by the 
temperature. Therefore the methods introduced are not restricted to the lattices studied 
here but will be applicable to other lattices as well. 

Most of the algorithms proposed use arrays to store values calculated prior to the simula- 
tion. The performances of the programs depend therefore crucially on the access time to the 
memory and in particular on the size of cache memory. For the simulations an Athlon 1800 
processor with 250 Kb cache memory has been used. This amount has quadrupled to 1 Mb 
with new processors now available (in 2004) and therefore the performances of the proposed 
algorithms should also increase. Moreover the performance also strongly depends on the 
compiler used. We used the Intel compiler "ice" with the optimization "-0" and obtained a 
code two times faster than the one with the "gec" compiler. 

In the following we start with a reexamination of the detailed balance condition. Then we 
introduce different algorithms trying them out on several types of discrete and continuous 
spins. The discrete spin systems we consider are the ±1 Ising spin, the general Ising spin 
with S > 2 states, the Potts and the clock models. For the continuous spins we investigate 
continuous Ising spins — 1 > S > +1, the Ising 4 -model, XY spins, Heisenberg spins, the 
Heisenberg 4 -model, and also the O(N) spin models for N > 4. 

II. DETAILED BALANCE CONDITION 

The basis in the simulation techniques we are going to discuss is the detailed balance 
in the updating process[8]. There the essential point is symmetry in time: the probability 
P(A) of an initial configuration A multiplied by the transition probability Ta->b to a new 
or final configuration B should be equal to the probability for the inverse process starting 
from the configuration B with P(B) and transition probability T b ^a, so that 

P{A)-T A ^ B = P{B)-T B ^ A . (1) 

3 



In choosing the transition probabilities T properly the spin configurations will be visited 
with the weights P and these are for a canonical system the Boltzmann weights. To be more 
flexible the detailed balance condition (1) should be rewritten in the form 

PU) f(B,A)-T A ./,- = J%*Lf(A,B).T B ^ A (2) 



f(B, A) ' ' ™ f(A,B) 

with an auxiliary function / depending on the two states A and B. There are essentially 
two ways to satisfy these equations, the "heat bath" method and the "Hasting" method, 
this last one includes the Metropolis method. 

In the first case the transition probability T A ^ B = T(B) and the function f(A, B) = f(B) 
depend only on the final state B, so that the detailed balance condition is reduced to 

T A ^ B = P{B) = ^||| ■ f{B) . (3) 

The simplified form T(B) = P(B) should be used if the probability can be integrated "easily" 
and inverted like for the discrete spins or Heisenberg spins. In principle any probability 
function P(B) depending on one variable could be integrated and inverted numerically and 
the result stored in tables. However, this method is not recommended for spin systems like 
XY spins for example, since the tables become too large and the algorithm will be costly in 
computing time to get sufficiently small errors. For such a case it is better to use the second 
form of eq. (3) and proceed as follows. 

The first procedure is the "heat bath" method. One finds a new configuration B = 
F _1 (ran) using a random number "ran" and the inverse of the function F which is the sum 
or the integral of /. Having found the new state B, it is accepted with the probability jj^j 

P( B) 

with a new random number < ran 2 < 1 by the usual rejection method if -j^j > ran 2. 
The function f(B) must be always larger than P(B) but the difference should be small 
since otherwise too many choices for the new states will be rejected. The simplified case 
corresponds to f(B) = P(B) and no second random number is needed. 

The second procedure is Hasting's method [9] with the transition probability equal to 



■A->B 



Putting this into (2) it is not difficult to see that the detailed balance condition is indeed 
fulfilled. The function / can here be smaller or larger than P which is of advantage compared 
to the stricter condition for the heat bath method. If one chooses f(A, B) = 1 for any A and 



B one gets the transition probability of the Metropolis method [1]. The gain in efficiency 
due to Hasting's generalization can be quite large in comparison to the Metropolis update 
depending on the spin model, the choice of the function / and the temperature range of the 
simulation. 

Whatever the method used will be, it should be ergodic, i.e. all configurations should be 
accessible. Since our special choices for the function / will be of a kind for which the local 
updates do not change in an essential way from the ones in use, we will not come back to 
this point. In the following we have to treat discrete and continuous spin systems separately. 

III. DISCRETE MODEL 
A. Simulation methods 

We present in this section six algorithms we have implemented for discrete spins. We 
assume that the spins have q possible states (1, 2, • • • q). The actual or old state is A and 
the new state B as before. 

1. Metropolis 

The first method one tries usually is the Metropolis algorithm [1] abbreviated as Me in the 
figures. As discussed in the preceding section in eq. (4) the choice is f(A, B) = 1 whatever 
the states A, B are. The implementation is as follows: 

1. Choose randomly one of q states B using int(g • rani) with rani a random number in 
the range < ran x < 1 or in the interval [0, 1[. The function "int" gives the largest 
integer smaller than the floating point number. 

2. Accept this new state with another random number, if P(B)/P(A) > ran 2 . 
There are two problems to be noted: 

a. The choice of the new state can be identical to the old one. 

b. The choice of a new state is equiprobable, but in particular at low temperatures the 
Boltzmann probabilities P(B) will be quite different. This fact will lead to a large 
and unproductive rejection rate. 
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2. Restricted Metropolis 



A way to solve the first problem is to choose the new state randomly omitting the actual 
state. We denote this as the restricted Metropolis algorithm or Mes- With the choice 
f(A,B ^ A) — 1 and f(A, A) = eq. (4) is practically the same as for the Metropolis 
algorithm Me. The implementation is therefore: 

1. Choose randomly one state B using int((g— 1) • ran i) and use a list to find randomly 
one of the q — 1 states different from A. 

2. Accept this new state if P(B)/ P(A) > ran 2 . 

Since the actual state is no longer tested using Meg one gains compared to Me an amount 
1/q in the updating process. If q is large, this difference becomes negligible. For the Ising 
two-states case only the opposite state is tested anyway. But as a standard for q > 2 it is 
not used, as far as we know, where also the performance is improved in a measurable way 
as we will show. 

As to the second problem b described at the end of the previous section it will be addressed 
by the heat bath method discussed next. In addition a restricted heat bath will solve both 
problems a and b simultaneously. 

3. Direct Heat Bath 

In discrete system the number of states is finite and one can simply choose the function f — P 
in eq. (3). We call it here Direct Heat Bath or abbreviated DHB. The implementation is 
simple: 

1. Calculate the Norm = P(0) + P(l) + • • • . 

2. Take a random number ran in the range [0, 1[. 

3. If ran < P (I) /Norm, the new configuration is chosen as I. 

4. Else if ran < (P(l) + P(2))/Norm, the new configuration is chosen as 2. 

5. and so forth . . . 

This algorithm has two flaws: 
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a. The new state can be identical to the old one. 

b. If there are too many possible states, the algorithm becomes very slow. 

The first problem is similar to the Metropolis algorithm but at least the new state is chosen 
among really accessible states according to their probability P{B). Therefore the acceptance 
rate is 1. 

This algorithm is always more efficient than the Metropolis algorithm Me and also the 
restricted Metropolis method Me$ . Only for Ising spins S = ±1 the Me$ method is more 
efficient than this heat bath method. 

4- Direct Heat Bath with Walker's alias 

To solve the problem of the inefficiency of the Heat Bath method for a larger number of 
states q we propose to use Walker's alias method [10]. Since this procedure was, to our 
knowledge, never used in spin simulations, it needs some explanations. 

Walker's alias method handles in an economic way which new state to choose among the 
q possibilities. The probabilities P(q) for a new state i are not piled up on top of each 
other to sum up to 1 as in the simple heat bath described before, but stored in q different 
boxes of equal height 1/q. Walker's construction has in each box only one or two different 
probabilities. For an example with q = 3 see fig. 5(b). Before the simulation starts one must 
have calculated and stored the probabilities Pi imit which divides each box i. The upper 
states in each box must also be stored in an array. These states as "subtenants" have an 
"alias" whereas the lower ones have the box number as correct address for the state i. 
The implementation has the following steps: 

1. Choose the box % using % = int(g • ran) +1 with a random number ran G [0, 1[. 

2. If ran < Pl imit , choose the new state as i, otherwise take the new state as Alias [i] . 

The consumption time is therefore independent on the number of states. The only limitation 
is the memory needed to store the arrays. The method to generate the arrays can be found 
in the ref. [11] and the implementation at our homepage [2]. The gain, using this method 
in comparison to the standard heat bath, is from one to ten depending on the number of 
states. In the figures when using this implementation of the heat bath algorithm, we denote 
it shortly as AW for Alias Walker. 
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5. Restricted Direct Heat Bath 



We want now to improve the heat bath algorithm by avoiding to choose the new state 
identical to the old one. The old state is A, the chosen new state B and the other accessible 
states C. To find the form of the function / in eq. (4) we start with the heat bath method 
and get f(A, B) = P( ff^ (c) • The denominator would be 1 and f(A, B) = P(B) if P(A) 
were included in the sum P(B)+P(C). For the reverse process starting from B and choosing 
the state A in the accessible states A + C, we have f(B, A) = p^+p^ ■ 
The implementation of the algorithm is therefore: 

1. Choose B in B + C using heat bath method, see above. 

2. Accept B if p^+p^i] > ran according to eq. (4). 

We call this algorithm the restricted heat bath or DHBs . For two states as the Ising ±1 
spins, this algorithm is equivalent to the restricted Metropolis one Me$ . 



6. Restricted Direct Heat Bath with Walker's Alias 



Finally, as above we make use of the Walker algorithm to accelerate the simulation. We call 
this algorithm the restricted Alias Walker Hasting Algorithm or abbreviated AWHs . The 
arrays to store the values calculated before the simulation are q times larger than for the 
AW algorithm because they depend on the actual state which can take q values. 

We treat thereafter the following discrete spins: Ising ±1, Blume-Capel model, Potts 
model and clock model, and compare the efficiency of each algorithm. 



B. Ising ±1 spins 

We consider here the standard Ising spin system. The spins Si — ±1 are under the 
influence of a local field h generated by the neighboring spins. They have an energy 

Ei = — T ■ h ■ Si (5) 

with the temperature T and the local field 

<j> 



where the sum is over the neighboring spins. The probability e E ^ T for the orientation of 
the spin Si can be written without normalization as 

P(±l) = e^ h . (7) 

Following the last section we discuss the various algorithms and their implementations for 
the Ising spins accessible at our homepage [2]. 

Since there are only two possible states the restricted heat bath DHBs and the restricted 
Metropolis Me$ algorithms are the same. Moreover no Walker aliases are needed. This is 
different for a multi-spin update for four spins on a square with 2 4 = 16 possible configura- 
tions. Each of the four spins has two neighbor spins outside which contribute to h the values 
— 2/T, or 2/T. We implement for this case both the heat bath AW4 and the restricted 
heat bath AWH 4S algorithms using Walker method. 

We put the results of the simulations for a square lattice with a size L = 10 in fig. 1. The 
integrated autocorrelation times r (see [7] for its determination) are displayed in fig. 1(a) 
for comparing the different algorithms. The maximum at the critical temperature T c is 
indicated by the black squares. As to be expected Tue > tdhb because the heat bath 
method chooses the new state according to its probability. This property is general and 
therefore also tdhb > t dhb s because the new state is better chosen by the restricted method 
DHBs than by the direct method DHB. 

For Ising spins this is not difficult to analyze. Imagine that the spin state is —1 and the 
local field h positive. The probability to get to the +1 state is e~ h / (e h + e~ h ) — \j (e 2h + 1) 
with the heat bath DHB whereas simply e~ 2h for the restricted heat bath DHBs. Since 
e -2h > iy/( e 2h _|_ !) an j s j n g S pi n j s more often flipped in the latter case. Similarly if the 
actual state is +1, the probability to get the state -1 is l/(e~ 2h + 1) < 1 with the heat bath 
DHB and strictly 1 with DHB & or the equivalent restricted Metropolis procedure. These 
differences are small if h 3> 1 but become noticeable for smaller h. For the two-dimensional 
ferromagnet on a square lattice small means already the critical temperature where a third 
of the flips occurs for h = 0. This fact explains also the difference in the acceptance rate 
between DHB and DHBs shown in fig. 1(c), 

It is evident by comparing fig. 1(a) with fig. 1(c) that a bigger acceptance rate corresponds 
to a smaller autocorrelation time. One can also see that a two times smaller acceptance rate 
double more than twice the autocorrelation time. The reason may be that some parts of 
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the lattice flip more than average, some parts less, and these slow spins have a dominant 
influence on r. We will come back to this point when we study the continuous spins. 

The multi-sites heat bath AW 4 and AWH 4S improve the situation compared to the single 
site DHB, but are somehow less efficient than the DHB$ or the restricted Metropolis Meg 
procedure. This procedure, usually called simply "Metropolis" in the literature is therefore 
the most efficient algorithm for ±1 Ising spins. 

For the judgment of the numerical efficiency the important parameter is not the autocor- 
relation time, but the actual time the computer consumes. If an algorithm A has a r two 
times smaller than an algorithm B, but is ten times slower to execute, it is better to use the 
algorithm B. We take as consumption time the product of the autocorrelation time r by 
the actual simulation time of one Monte Carlo step. 

The simulation time is plotted in fig. 1(b). For one MC step Me, DHB and DHB§ = 
Me<5 algorithms use almost the same time. As to the results for AW 4 and AWH 4S the four 
spins algorithm AW 4 is 40% faster than the three previous one spin algorithms since the 
four spins are flipped simultaneously. The benefit is lost with the AWH 4 s method even if 
implementations are almost the same. The disadvantage is due to the limited fast cache 
memory of the processor with an access time 10 times faster as ordinary memory. Using 
the Walker's method the informations to be stored becomes 16 times larger for AWH 4 s 
than for the AW 4 . For the same reasons, when the temperature increases, the number of 
configurations simulated increases, and the number of variables stored in the cache memory 
overflows its capacity. This explains the behavior of AWH 4 s. 

Finally we show in fig. 1(d) the consumption time of the computer to compare the effi- 
ciency of each algorithm. At the critical temperature, the restricted heat bath DHB$ and 
the restricted Metropolis Me$ are 4 times more efficient than the Metropolis Me, and 2.5 
times more efficient than the heat bath DHB. It is interesting to compare the efficiency of 
AW 4 and AWH 4 s. The last algorithm has a smaller autocorrelation time than the former, 
but a larger time of simulation and therefore it is less efficient. Even the AW 4 is not really 
more efficient than DHB$ = Mes. Therefore for the ±1 Ising spin the restricted heat bath or 
Metropolis algorithm should be used since also a single-site algorithm is simpler to program. 

We checked that these results do not change markedly as a function of the size. This is 
expected since all the algorithms we compare are local, the autocorrelation times have the 
same behavior and their comparative efficiency remains the same. 
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In addition we tested the two and three-dimensional antiferromagnetic triangular lattices 
and the three-dimensional ± J spin glasses on a cubic lattice. The results are alike and even 
more in favor of the DHBs = Me$ compared to DHB and Me. This is so because the number 
of zero local fields is more important in these cases, up to 30% at low temperatures for the 
triangular antiferromagnetic lattices. 

C. General Ising spin and the Blume Capel model 

The Blume-Capel model is defined by a Hamiltonian 



with the first sum over nearest neighbor pairs and the second one over all N spins. Each 
spin has 2 S + l components Si = —S, —S+l, ■ ■ ■ S. This model can be used to describe a 
mixture of He 3 and He 4 . Initially the model had three components [12]. Later the model 
was extended from S = 1 to S — 3/2 [13] and could be generalized to any S. 

We apply the single-site algorithms defined previously. For this model, contrary to the 
Ising model, the restricted Metropolis Me$ is no more identical to the restricted heat bath 
DHBs and its corresponding Walker method AWH$. 

In fig. 2 we present results for this model. The graphs for acceptance rate are shown in 
fig. 2(a), equivalent to the Ising case in fig. 1(c). The simulations are for a square lattice of 
size L = 10 for S = —1, 0, 1 with A = 0. AWH$ and DHBs have the same acceptance rate 
since they are only variants of the same restricted heat bath algorithm. This is also the 
case for the heat bath algorithms DHB and AW. The autocorrelation times r, not shown 
here, follow a similar pattern as the acceptance rates and therefore tawh 5 — t dhb s < taw = 

T~DHB < T~Me s < T~Me- 

In fig. 2(b) the consumption time is shown for the same model, that is S — ±1,0 and 
A = 0. At the critical temperature the restricted heat bath algorithms DHBs and AWHg 
have the same efficiency, whereas the heat bath algorithms DHB and AW are 25% less 
efficient and the restricted Metropolis method 50%. The Metropolis algorithm, usually used 
in the literature is 2.7 times slower. The consumption time at the critical temperature if 
A ^ in the fig. 2(c) exhibits similar features as seen in fig. 2(b), which are even more 
pronounced in favor of the restricted heat bath. 




(8) 



<ij> 



<i> 



11 



In fig. 2(d) the consumption time at the critical temperature for A = is displayed as 
function of S. For very large S the restricted and not restricted algorithms give similar 
autocorrelation times. However, the simulation times for one step are not the same for the 
algorithms and in particular the Walker algorithm AW and AWH$ should give better results. 
For intermediate values of S shown we observe that all the heat bath algorithms for S > 3/2 
are almost equivalent and much more efficient that the restricted Metropolis Me$, which is 
more than 1.5 times slower, and the simple Metropolis Me more than a factor 2.3 . 

Since the algorithms are local these results hold for other lattices as well and larger sizes. 
In the next section some results are shown as a function of the size and for different lattices. 

D. Potts model 

The Potts model [14] is defined by a Hamiltonian: 



5 qiqj refers to the q state Potts spin with 8 qiqj = when ^ ^ qj and 8 qiqj = 1 when = qj. 
For the two-dimensional square lattice with ferromagnetic interactions the transition is of 
second order for q < 4 and of first order if q > 4. This model is still extensively studied in 
particular with antiferromagnetic interactions [15]. 

In fig. 3 we plotted the result for the ferromagnetic three state Potts model (q = 3) on 
a square lattice (2s). The system size is L = 20. The results are very similar to the Ising 
two states case, but as clearly seen in fig. 3(b) the Walker method improves the situation 
compared to the heat bath by a gain of 30% for the direct heat bath and of 50% for the 
restricted heat bath. From the fig. 3(d), at the critical temperature the restricted heat bath 
with Walker's alias AWH S is 50% more efficient than the same heat bath without Walker's 
alias DHB S , and more than two times better than the heat bath procedures AW and DHB. 
The factor increases to more than three for the restricted Metropolis Me$ and to six when 
Metropolis Me is used. Obviously this ratio depends on the implementations and the ones 
used here are accessible at [2]. 

For the same model and at the critical temperature in fig. 4(a) the graphs for the con- 
sumption time are shown as a function of the system size L. One observes that the results 
do not depend on the size in a marked way. 




(9) 
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In fig. 4(b) we show the consumption time at the critical temperature as a function of 
q. For q — 10 the system exhibits a strong first order transition and we have used the 
multicanonical algorithm [6] in combination to the local algorithms. For large q restricted 
algorithms are equivalent to non restricted ones. The heat bath algorithms are much better, 
w 8 times faster for q — 10, than the Metropolis ones. We observe an increase of the 
consumption time for AWH$ algorithm for q = 10. This is due to the size of arrays needed 
to store which exceeds the possibilities of the cache memory. For more details see the Ising 
±1 section and the problems for the AWH 4S - 

Since all the algorithms are locals the results hold for other lattices and interactions as 
well. In the fig. 4(c) and fig. 4(d) we plotted the results for the acceptance rate and the 
consumption times for the anti-ferromagnetic three state Potts model on a three-dimensional 
cubic lattice (3ca). We get similar results at the critical temperature with a gain of four 
when we compare the AWH S to the Metropolis algorithm Me. 

E. Clock model 

The q state clock model [16, 17] is a discrete version of the continuous XY model. The 
Hamiltonian may be written as 



For q = 2 this is the Ising model and for q — > oo the XY model. An example for q = 3 is 
shown in fig. 5(a). 

We adapt as before different local algorithms to this model. In fig. 5(b) we plot an 
example for the Walker "boxes" described in the section ref III A 4. For this model it is 
advisable to change the restricted Metropolis method in the sense that a new state q new is 
confined to the neighborhood of the old state q actual . The simplest way is to take the new 
position within the limits q actual — 5 < q new < q actual + § with 5 varying from 1 to q/2. 

In fig. 5(c) the autocorrelation time is displayed for the 10 state clock model with ferro- 
magnetic interactions on a square lattice (2s) as a function of the temperature T oc 1/h and 
5. At low temperatures the best algorithm is the one which tests only new states close to the 
actual one and at high temperatures it is better to test all possible states. At temperatures 
between an intermediate 5 works best. 




(10) 
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Before looking at fig. 5(d) we turn to fig. 6. The model tested there is the six state 
clock model on the two-dimensional square lattice (2s) of size L = 10 with ferromagnetic 
interactions. It is clear that the Walker method improves considerably the heat bath methods 
as can be seen in fig. 6(b). Obviously this depends on the implementations and the ones used 
in these figures are accessible at our homepage [2]. The gains at the critical temperatures 
(there are two critical temperatures [16]) are quite appreciable in using the heat bath Walker 
method AW. This method is at least four times faster than the Metropolis methods as can 
be seen in fig. 6(d). In the figures the si Me^ corresponds to the optimal choice of 8 for 
each temperature. 

Now we come back to the fig. 5(d). The consumption times at the critical temperature 
(the higher temperature of the two critical temperatures) are displayed as function of the 
number of states. Whatever q is, we observe that the Walker heat bath methods are much 
more efficient than any other method. Moreover for large q we notice that the restricted 
Walker heat bath AWHs algorithm becomes less efficient than for low q. The reason is 
that the arrays become too big to be stored in total in the cache memory, a problem we 
encountered before (see the Ising ± section). Interesting is also the result for the q — 2 
case corresponding to the Ising model. We should get identical results for AWHs and Me$ 
since they are the same algorithm in this case. Indeed the same autocorrelation times are 
found but the simulation times differ and therefore also the consumption times. The reason 
is that our implementation which treats any q does not use special features of the q = 2 
case. Again the reader should be aware that a good implementation is as important as a 
good algorithm. 

IV. CONTINUOUS SPINS 

We study now continuous spin systems. The Hamiltonian h should have the form 

H=-Tj2h t -S l (11) 

i 

similar to (5) and (6) with 

j 

A magnetic field term or any other potential function depending only on norm of the spin 
Si can be added. We note that it is possible to write the dipolar spin spin interaction this 
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way. This form is dictated by most of the improved simulation methods we are going to 

discuss. For more details about this limitation we refer to the section IV I. 

The probability to find the spin i in the state Si is given by the Boltzmann factor 

P(Si)-dSi = e hi - Si ~ hi ■ d Si 

— e h ' x ~ h ■ dx for continuous Ising spins x G [—1, 1] , (13) 

= e h - cos x ~ h ■ dVL for 0(N) spins with N > 2 (14) 

with 

dtt = dx for N = 2 with x G [-7T, +tt], (15) 

= sinxdx-dy for N = 3 with x G [0, n] & y G [0, 2?r] , (16) 

= sin 2 xdx-smydy-dz for N = 4 with x, y G [0, it] & z G [0, 27r] (17) 

and similarly for > 4. In eq. (13) and (14) the suffix % has been dropped. The additional 
factor e~ h in the definition of the probability P is a kind of normalization. 

A. Simulation methods 

In this section we will present the methods to be tested on continuous spin systems. The 
starting point are again eqs. (2-4). 

1. Metropolis algorithm 

As for discrete spins, the easiest algorithm to implement is the Metropolis method Me. 
It consists of choosing randomly new spin directions and applying the condition (4). For 
N < 3 this is not a difficult task. However for N > 3 we cannot integrate and invert the 
weight of dfl in Eq. 17. In this case the standard procedure is to calculate the new spin 
direction using Gaussian random numbers. More details about the different spin types will 
be found in the corresponding section. For any system the Metropolis procedure is much 
less efficient, at least by a factor two, than the algorithms we will discuss below. 
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2. Restricted Metropolis algorithm 

The restricted Metropolis algorithm Mes is here defined in a similar way as for the 
clock model in section HIE. The new state is chosen around the actual one restricted by 
x actuai <- x new < ^acW _|_^ us j n g Hasting's method (4). This method is usually applied for 
the XY spin N = 2, but could be used for any N > 2. The results obtained are better than 
for the Metropolis algorithm but still this method is less efficient than the ones introduced 
further down. 



3. Direct Heat Bath 

The direct heat bath DHB can be used only for continuous Ising spins —1<S<1 and 
Heisenberg spins with 3 components putting simply f(B) = P(B) in (3). The probability 
is for both cases eq. 13, that is P(x) ■ dx = e h ' x ~ h ■ dx with x G [—1, 1[. The cumulative 
probability F given by the normalized integral of P is then 

S-i n*) dx 



Fix) - 

CI P{x) dx 



e +h _ Q -h 



(18) 



so that < F(x) < 1. In fig. 7(b) F(x) is plotted. From a random number F = ran between 
and 1 one can then determine — 1 < x < 1 by inverting (18): 

x = 1 + \ ■ log((l - e~ 2h ) ■ ran + e~ 2h )) . (19) 

This formula has been used for an implementation [18] which actually is easier than for the 
discrete spins described in the previous section III A 3. With a slight modification one can 
improve the efficiency especially for large h or low temperature. One generates a value for 
x in the limit — oo < x < 1 by 

x = 1 + ^ • log(ran ) (20) 

and exclude the values outside the range [-1,1]. This procedure must be iterated until 
one obtains a value between — 1 < x < 1. This way one does not need to calculate e~ 2h 
but a number of choices are rejected and additional random numbers must be generated. 
One assumes here that h is positive so that the probability P(x) oc e x ' h can be extended 
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to arbitrary negative values of x. If h is negative one has to change the sign and put 
x =>- x ■ sign(/i). At the critical temperature and below typically one has h > 1.5 for various 
models and the efficiency increases by 25% using the simpler rule (20) instead of (19). 

For vector spin models with n — 2, 4, and more components the direct heat bath method 
cannot be applied since the weights are too complicated for a simple integration. One is 
therefore forced to choose another form for the function / in Eq. 3 and use the rejection 
method. The choice is very broad and many functions have been tried especially for XY- 
spins with two components. However, the function chosen are particular for each model. 
We introduce below three completely general algorithms applicable to any model with an 
energy of the form given in (11-12). 

4- Fast linear algorithm 

For an efficient heat bath simulation a function f > P should be chosen for which the 
integral is easy to determine and also the inversion poses no problem. For a maximum gain 
in simulation time we find that the best choice is a function with n steps. For an example 
see fig. 7(a) and (c) where we present choices of step functions for the continuous Ising spin 
S (— 1 < S < 1) and h = 5. The idea is to use this together with the Walker algorithm or 
a variant of it. We first describe the method, called the Fast Linear Algorithm (FLA), in 
detail. 

a. h is constant We first consider the case h = const where h is defined in eq. 12, and 
then show later how to deal with the real situation where h can vary. 

The function / is made of n constant parts /«. We choose the fa with two conditions: 



with the constant area a to be determined. For an example see the fig. 7(a). The first 
equation ensures that / is always greater than q. The second is useful for a simple and 
faster way to invert the cumulative probability F. Indeed we see that F is composed of n 
intervals of straight lines (integration of n constant functions). With (22), all intervals are 
equal as shown in fig. 7(b). Therefore, only a few steps are necessary to invert F. This is 
done as follows. 



fi = 



maximum of q(x) in the interval , 



(21) 
(22) 



fi ' ( — Xi) — a , 
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Choosing a random number between and 1, we find the interval corresponding i 



% = int(n • ran ) (23) 

where int means the conversion from real to integer. Having %, it is not difficult to find x 
that we are looking for: 

x = Xi + (n ■ ran — i) • (a^+i — x t ) . (24) 

Since we use only conversions and simple operations, this method is extremely fast. We 
come back to this point later when we conjecture that it is impossible to find a faster 
algorithm for this class of problems. Moreover, for n — > oo the acceptance rate, i.e., the area 
of P divided by the area of / between [—1 : 1[, tends to 100%. Typically with n being equal 
to some hundreds one obtains a rejection rate (=1— acceptance rate) of few percents. 

Technically, we find the {xi} and the {/*} using (21-22) and store the value of a and {xi}. 
This must be done once before the Monte Carlo simulation begins. Then apply (23) and 
(24) at each Monte Carlo step, calculating using (22). 

To determine the {x{\ we use an iterative procedure. First we fix a. Next, using (21) and 
(22), we calculate the corresponding {x^ and {/»}. When x[n] > 1 we reduce a, otherwise 
we increase it. The procedure is stopped when 1 < x[n] < 1 + e, with e being the accepted 
error. Then, we fix x[n] = 1 and = ^z^— q ■ Since e is positive, /„_i obeys (21). One 
possibility is to put the initial value of a at a = A/n, A being the area of P(x) between 
[—1,1] calculated numerically. 

b. h is variable Now we have to consider the case where h is no longer constant. We 
notice Ph x > Ph 2 if hi < hi with hi the norm of the vector hi, therefore hi > 0. We divide 
the possible range of variation of h in M parts corresponding to hi, h 2 , ... ,h M and use 
the table calculated at hj for hj < h < hj + i. Due to this procedure the acceptance rate 
decreases. We decide to choose the ({hj}) so that the rejection rate is less than a threshold 
using: 



area of P^ — area of Ph j+1 



threshold . (25) 



area of P^ 

The area of Ph is calculated numerically. 

However, the values {hj} are not linearly distributed. Therefore, we create a new table 
tableh with K elements. In each Monte Carlo run we calculate the value k with a conversion 
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from real h to integer k and the tableh gives us the value of hi. More specifically: 



k = int(K ■ (h - h )) , 
hi = tableh [k] , 



(26) 
(27) 



where h is the first value of hj. The tableh has to be filled before the beginning of the 
Monte Carlo simulations. K, the number of elements of the table, is free, but must be large 
enough to be able to differentiate the hj. 

In summary, we have a way to generate the value x with a probability f(x) using (26), 
then (27), then (23) and finally (24). Using this value of x we apply the rejection method, 
i.e. accept the new value if jj^j > ran 2 , where ran 2 is a random number in the interval 
[0, 1[. We call this algorithm the Fast Linear Algorithm because we use only linear equations. 
Few steps are necessary to obtain the final value x looking at the C program given at our 
homepage [2]. 

c. Advantages and flaws First we would like to argue that no other algorithm using 
the rejection method could be faster than the Fast Linear Algorithm. 

The rejection method needs two uniformly distributed random numbers at each Monte 
Carlo step. Step 1 is to get a value x from a known cumulative of the test function / and 
step 2 is to compare the value f(x) to the probability. To compare different algorithms 
using this method, it is enough to calculate the time necessary to perform step 1 and step 2 
without considering the time to produce the random numbers since it will be the same for 
all algorithms. 

The consumption time can estimated in comparing the various possible operations exe- 
cuted by the computer. We take the following equivalences: basic operation (*,+,...) = 
lunit, conversion from real to integer (int) = 3 units, (if) = 3 units, square root (sqrt) 
= 6 units, and calculating a function like cosine, exponential or logarithm = 13-18 units. 
These estimates are only approximate, but sufficient for comparing the different algorithms. 

To apply the FLA one needs two conversions from real to integer according to eq. (23) 
and (26), and a few basic operations like additions, multiplications, and the use of tables. If 
we count the time necessary for our algorithm very roughly, we get 20 "units" . Therefore, 
the time necessary for one step of the FLA-algorithm is comparable to the time needed to 
calculate one function. Another algorithm, to be as fast as FLA, should use only once a 
calculation of a function. However, in most of the cases, it will never reach such a high rate 
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of acceptance comparable to FLA of nearly 100%. Therefore the algorithm FLA should be 
always the fastest algorithm. This conclusion holds only when the rejection method is used, 
not for the heat bath methods when the probability can be integrated and inverted as for 
a classical Heisenberg model. However, even in this case it will be shown in the following 
that this heat bath algorithm will not be faster than the FLA algorithm. 

We note another advantage of this method. It is completely general and can be applied 
to all types of probabilities. Moreover, we have introduced the FLA with a probability on a 
fixed range [— 1, 1[. However, it is possible to handle also the case of an infinite range with 
a somehow more complex program, as can be seen in the section dealing with the Ising c/> 4 
model. 

The only flaw of FLA is that a certain amount of memory must be available to store the 
tables. For a Hamiltonian which can be factorized like in eq. (11-12) there are two variables, 
x and h, to handle. Typically we get files which are less than 100 Kb for a rejection rate 
of about 15% percents. If one wants to decrease this rate the size of the tables increases 
and the cache memory could become too small to store the arrays. A test of the maximum 
number of steps which minimize the consumption time is needed at each simulation. See for 
an example fig. 8(b). 

5. Walker's Algorithm 

The last algorithm is very efficient, but it is not so easy to calculate the {xi} and it 
becomes really complex in presence of more variables Xi,yi, In this case the Walker 
algorithm could provide a good solution. The only difference between the Fast Linear 
Algorithm (FLA) and the Walker algorithm (AW) is the choice of Xj. Instead of (22) we 
choose the X{ at fixed interval, for example on the fig. 7(c) the dashed line ("AW without 
xh")- Another possible choice to avoid to get a lot of informations in the region where the 
probability is almost zero is to choose X\ = xh and the others x at fixed intervals between 
x x and 1 ("AW with x H n ). 

Whatever the choice is, since (22) is no more valid, we cannot use only (23) to calculate 
%. We must use the Walker's alias introduced previously (section III A 4). We must calculate 
the integral of each step function, i.e. the area f\ ■ {xi + \ — Xi). Walker's algorithm is used 
to divide them in n equal boxes as shown in fig. 7(d) for the step function "AW with xh" 
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shown in fig. 7(b). During the simulation, we have to use, as for the discrete spins, an if 
condition to choose the correct "state" . Therefore in addition of (23) we must use: 



if (ran > P l limit ) i = Alias[i] . 



(28) 



And (24) must replaced by 



x = x i 



+ ran ■ [x i 



box, fin 




) 



(29) 



with x ° x,mi and x° x are the limit for each box and "state" . Otherwise the algorithm is 
similar to the FLA. 

The difference of this algorithm and the FLA is that we have to store much more data in 
the arrays. Precisely we have to save the {Pt imit }, {x* ox ' ini }, {x* oxJin }, {Aliasfi]} and also 
{fi} to apply the rejection method. This must be compared to a and {x^ for the FLA. As 
a consequence the arrays are 6 times bigger for the AW than for the FLA. To be efficient 
arrays must be stored in the cache memory and therefore the AW is less efficient than the 
FLA. For example see fig. 8(d) and fig. 11(b). 

However the arrays are very easy to create contrary to the ones for the FLA, even with 
more than one variable. We will present two examples with two variables below when 
treating Heisenberg spins with a 4 potential. 

6. Walker's Algorithm with Easting's method 

Another advantage of the Walker algorithm over the FLA is that we can use Hasting's 
method. Indeed to apply this last method we need to be able to get not only F _1 (ran) but 
also f(x actual ) as seen in the Eq. 4. To obtain easily f( x actual ) we need a simple law for x i: 
the simplest being the [xi] distributed at constant intervals. Then we use the Hasting's 
method (4) in combination with the Alias Walker. We call it, as before, the Alias Walker 
Hasting (AWH) algorithm. 

One big advantage of the Hasting's method compared to the heat bath-rejection method 
is that we do not need to choose the function / bigger than the probability P. This proves 
very useful if the Hamiltonian cannot be factorized like H — hj • Sj (11). In this case the 
FLA cannot be applied but the present method works. 
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In the next section we present the vector spin with dimensions N from 1 to 6 separately. 
We try to be as complete as possible in order that the reader can use the proposed methods 
easily. For each case the C programs are accessible at our homepage [2] . 

B. Continuous Ising spins —1 < S < +1, N = 1 

The continuous Ising spin varies from -1 to +1 with an uniform probability so that the 
Boltzmann weight according to (13) is 

P(Si) ■ dSi = e h - x - h ■ dx , x e (-1, 1) . (30) 

We explain in more detail the results for this first example of continuous spins since the 
vector spin cases N > 1 discussed later will be technically similar. We restrict ourselves to 
three types of algorithms. 

The first is the Metropolis algorithm Me where one chooses the new value x new uniformly 
between -1 and 1. This value is accepted with a probability P(x new ) / p{x actual ) according 
to eq.(4) otherwise the old value x actual is kept. To gain a better understanding how this 
procedure works we turn to fig. 7(a) where the probabilities for a local field h = 5 and 
h — 10 are plotted. The Metropolis choice for the function fue ="1" is also shown. Since 
Hasting's method is used, the function / is automatically renormalized to p( x actual ). If 
P(x new ) > p(x actual ) the new state is always accepted, which corresponds to x new > x actual . 
Otherwise for x new < x actual the new state is taken with a probability P{x new )/ P{x actwd ). 
Graphically the acceptance rate is equal to the ratio of the areas under P(x new ) and under 
the straight line jue — P( y x actual ) for / > P. If the local field h increases or the temperature 
decreases, this ratio decreases tending to zero for low temperatures as can be seen in the 
fig. 7. 

The second method, the heat bath DHB, does not have this defect. Formula (19) is 
used to find the new value x new . Here the rate of acceptance is 1. Updating with formula 
(20) instead creates difficulties since small local fields occur too frequently. The cumulative 
probability F is displayed in fig. 7(b). 

The third type of simulation is Walker's method including the Fast Linear Algorithm FLA, 
the Alias Walker algorithm AW and the Alias Walker Hasting algorithm AWH. For the two 
last algorithms we choose x 1 with P(x 1 ) = 0.01 and the others {xi} are fixed at constant 
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intervals between x\ and I. We plot such a construction in fig. 7(c) under the label ll AW 
with xh" ■ This somewhat arbitrary choice of x± is recommended since for — 1 < x < x\ the 
probability is small enough and therefore it is not necessary to provide detailed informations 
on this range. The corresponding Walker aliases are plotted in fig. 7(d). 

The FLA fixes automatically this last problem. In fig. 7(a) the step function J'fla is 
shown for five bins and the corresponding cumulative probability in fig. 7(b) . There the 
dashed lines are equidistant because of the condition (22). In this method the acceptance 
rate is the ratio of the area under P and under /fla in fig- 7(a). This ratio can be made 
close to 1 by increasing the number of bins. The deviation from this ideal value we will 
denote by "error" in the following. 

We compare now in fig. 8 the efficiency of the different algorithms for an antiferromagnet 
on a three-dimensional stacked triangular lattice (3t). A system size of L = 6 is sufficient for 
the test. Fig. 8(a) shows the refusal rate of FLA, AW and AWH as a function of the "error" 
given as input to the programs create_01 . out and create_01_walker . out [2] used to 
create the arrays. The actual refusal rate is approximately only half of the "error" value for 
the FLA and the AW, and one fourth for the AWH. It is worth to note that AW and AWH 
use the same information stored in an array. The difference is that the Hasting algorithm is 
automatically renormalized to the value p( x actual ) as explained before, and therefore it fits 
the probability distribution P better. 

The fig. 8(b) shows the consumption time for the last three algorithms at T = T c ps 1.2. 
The three curves have a similar behavior. At first, if the error decreases by increasing the 
number of bins the refusal rate will decrease, and therefore the consumption time decreases 
also. However if the number of bins becomes too large, the array cannot be kept in the fast 
cache memory, the time of simulation increases and the gain in the acceptance rate does 
not compensate the loss in the simulation time of one MC step. We remind the reader that 
the consumption time is defined as the product of the simulation time of one MC step and 
the autocorrelation time. This last quantity being shorter for a higher acceptance rate. We 
observe that the FLA has the smallest consumption time because the array to be stored is 
six times smaller. 

We display in fig. 8(c) also the dependence of the acceptance rate for the various algorithm 
on the temperature or more precisely on the inverse of the local field 1/h oc T. The number 
of bins for the algorithms (FLA, AW and AWH) is chosen to minimize the consumption 
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time. The acceptance rate of Me goes to zero when the temperature decreases as has been 
discussed before. As a consequence the autocorrelation time (not shown) increases without 
limit. As can be seen in fig. 8(c) the AWH acceptance rate is larger than the AW rate, even 
if the same array is used. The FLA and the AWH have a high acceptance rate for any h 
comparable to the acceptance rate of the direct heat bath DHB which is strictly 1. 

There is a difference between the acceptance rate of the Hasting algorithms AWH and 
Me on one hand, and the heat bath algorithms AW and FLA on the other hand. In the first 
case, after a refusal the same spin is kept. In the last case a new spin is chosen until it is 
accepted, and therefore the new state is always different from the old one. We will see the 
implication of this fact in the next paragraph. 

The most important parameter in numerical simulations, the consumption time, is dis- 
played in fig. 8(d). Clearly the FLA algorithm is the most efficient one even compared to 
DHB. It it interesting to compare the result for Me with the inverse of the acceptance rate 
multiplied by the ratio of the times of simulation for Me and FLA, plotted with the symbol 
%. If the system were composed of only one spin, the two lines would collapse. We observe 
that the consumption time for Me is above the % line. Indeed, if the spins of some parts 
of the lattice are flipping more than the average, some group of spins flip much less than 
the average and these last spins have a strong influence and the autocorrelation time r, and 
therefore on the consumption time for the simulation. 

We have tested different lattice sizes L in order to check that results discussed do indeed 
not depend on L, since the algorithms are local ones. For different lattices the only change is 
the value of the local field h. Looking at our homepage [2] one can see the nearly collapsing 
curves for an anti-ferromagnet on a two-dimensional triangular lattice (2t) and a ferromagnet 
on a square lattice (2s). 

In the literature, studies on the continuous Ising model are not very common. To our 
knowledge the most recent study is for two-dimensional ferromagnetic interactions of long 
range using the Metropolis algorithm [19]. The use of the heat bath methods FLA or 
DHB would increase the efficiency because an acceptance rate close to 100% is even more 
important if the interaction is of long range. Since the simulations for the continuous Ising 
model (N = 1) is similar to the Heisenberg case (N = 3) all conclusions for the former 
model hold also for the latter. 
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C. Ising cf> 4 model, N = 1 



In this section we study the Ising 4 model [20] or Ginzburg-Landau model on a lattice 
where the spin variable x varies from — oo to oo. The Boltzmann weight is 

P(Si) ■ dS % = e h-x^-x^-if . dx (31) 

and in fig. 9(a) this probability is plotted for h — 1 and A = 1.3182. It is interesting to 
study this model for its own sake. Here it serves as an example how to cope with the infinite 
boundaries of the probability distribution in the FLA, AW and AWH algorithms. Cluster 
algorithms exist for this model but one needs a local algorithm to vary the norm of the spin 
[20]. The probability (31) cannot be integrated in a simple way and therefore no direct heat 
bath method is available. 

With the restricted Metropolis Me$ algorithm the new spin x new variable is chosen around 
the actual one: x actual — 5 < x new < x actual + 5. Then the Hasting formula (4) is applied with 
the function f Mes = 1 plotted in fig. 9(a). We compare here the efficiency of the restricted 
Metropolis algorithm Mes only with the Fast Linear Algorithm FLA which is faster than 
AW and AWH. 

The FLA is slightly different from the one described in the previous section since a first 
step starting at x = — oo and a last one ending at x n = +oo would lead to a divergence for 
the cumulative function F. The solution is to take two exponential functions for the first 
interval [xcb^i] an d the last one [x„_i, x n ], and otherwise to take a step function for /. An 
example is given in fig. 9(a) for n = 11 steps. Unfortunately the implementation is also a 
little bit more complicated than previously. 

Fig. 9(b) gives the consumption time at different temperature for the FLA as function of 
the "error" given to the program create_01_Phi4. out (accessible from our homepage [2]). 
The best choice is 20% corresponding to n = 38 bins. 

The fig. 9(c) and (d) display the acceptance rate and the consumption time for a ferro- 
magnet on a cubic lattice (3c) for a system size L = 6. We observe that the FLA is much 
more efficient than the Metropolis algorithm. At the critical temperature represented by 
the squares, the Metropolis Me$ is almost three times slower than the FLA. 

Since our algorithms are local, results do not depend on the system size nor on the 
type of the lattice. For this model there exist cluster algorithms and also an over-relaxation 



25 



algorithm [5] . We must employ them in combination (the section devoted to XY spins shows 
results for the FLA in combination with over-relaxation). Moreover if we are interested 
in frustrated system there are no more cluster algorithms available and an efficient local 
algorithm is fundamental. 

D. XY and U(l) variables, N = 2 

The XY spin is a two-dimensional vector of norm one. Since there is an equivalence 
between the direction of the spin and a phase, both characterized by one angle, XY spin 
systems and U(l) gauge theory (see for example [21]) have a common basis. Due to the 
varied interests many algorithms have been tried. We want to compare them to the Fast 
Linear Algorithm FLA which proves to be the fastest algorithm, and the Walker Hasting 
algorithm AWH. 

The Boltzmann probability is simply 

P(Si) ■ dSi = e h - cosx ~ h ■ dx (32) 

with the angle x varying between — n and ir. 

Contrary to the probability of the former case N — 1 given by (30) the cumulative 
probability and its inversion is too costly in computer time and consequently no direct heat 
bath method DHB is possible. One must use the heat bath rejection or Hasting's methods 
instead. 

We have tested first the standard Metropolis algorithms Me for which f(x) = 1 in eq.(4). 
The new angle x new is chosen randomly between — it and it and accepted according to the 
probability P which is shown in fig. 10(a) for h — 4. If the local field h increase, the 
probability is more peaked near the origin and the acceptance rate decreases as explained 
before. The restricted Metropolis Mes gives better results. In this case x actual — 5 < x new < 
x actual + 8 as indicated in fig. 10(a). For a small enough 5 the acceptance rate can increases up 
to one, but then the spin configuration will change very slowly and the autocorrelation time 
becomes large. This situation is the same as for the clock model described in section HIE. 

For the heat bath method many different functions f(x) > P(x) in the interval (— ir, ir) 
have been proposed. For the auxiliary function / of eq.(3) Moriarty [22] proposed an ex- 
ponential form (Mo), Edwards et al. [23] a Gaussian form (G), and Hattori and Nakajima 
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[24] an ingenious hyperbolic cosine function (H). For clarity we plotted in the fig. 10(c) these 
functions together with the probability P for h = 4. 

In principle the FLA and the AWH algorithms follow a similar strategy The FLA step 
function / is shown in fig. 10(a) for 9 bins. As explained before one should choose the optimal 
number n for the intervals to minimize the consumption time. This is shown for the FLA 
for three different temperatures for an anti-ferromagnet on the three-dimensional triangular 
lattice in fig. 10(b). With the choice of n = 56 bins the Boltzmann weight is sufficiently well 
approximated by an error of 15% and this choice will be almost independent on the lattice 
type. 

Before looking at fig. 10(d) where the Metropolis procedure is compared to the more 
efficient FLA procedure, it is interesting to compare all the different algorithms just reviewed. 
Simulations are made for the two-dimensional triangular lattice (2t) and for the spin glass 
with ±J interactions on a four-dimensional cubic lattice (4sg). The results are presented in 
fig. 11 for a system size of L = 12 for the 2t-lattice and for L = 4 for the 4sg-lattice. 

In fig. 11(a) where the acceptance rate is displayed, one observes that the FLA has the 
best acceptance rate followed by the Hattori algorithm. However, the consumption time 
displayed in fig. 11(b) is the fundamental parameter for a comparison. Then the situation 
changes dramatically. The FLA is still the most efficient of all algorithms with a gain of 
60% compared to the Gaussian algorithm G and 200% compared to the Metropolis one Me 
at the critical temperature. The Hattori's algorithm becomes the worst of all algorithms 
(at least for T > T c = 0.514, see [25]) in spite of a very good acceptance rate. This is due 
to a very long time of simulation of one MC step since the function / and the cumulative 
probability are too complicated. This criticism is in agreement with [26]. 

Fig. 11(c) shows the results at the critical temperature for the Metropolis Me and Mes and 
for the FLA algorithms. These algorithms are combined with the over-relaxation algorithm 
[5] where the number of these additional updates is denoted by OR. The combination allows 
a gain of a factor two for the FLA and three for the Me. The gain in using FLA instead of 
Me is 3 without an over-relaxation step but it decreases to almost 2 with one or two such 
OR-steps which consume very little additional time. The best number of over-relaxation 
steps depends on the temperature or the value of the field h and on the lattice size. 

Fig. 11(d) displays the consumption time for a spin glass on a four- dimensional cubic 
lattice (4sg) with ±J interaction [27]. In combination with the local algorithm the exchange 
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algorithm [4] is used. There is almost no difference between the heat bath algorithms FLA, 
G, Mo and H, compared to the (2t) case shown in fig. f 1(b). Also the Metropolis procedure 
is "less worse" when used in combination with the exchange algorithm. It is "only" 1.5 
times less efficient than the FLA at the critical temperature shown by the squares. This 
factor grows until 2 times for Me s or 3 times for Me at lower temperatures, one is interested 
in for the spin glasses. The best simulation technique should use the FLA procedure in 
combination with the over-relaxation and the exchange algorithm. 

As discussed before the results do not vary strongly as function of the size. Further 
results for the three-dimensional antiferromagnetic stacked triangular lattice (3t) and for 
the two-dimensional ferromagnetic square lattice (2s) can be found in our homepage [2]. 
They are practically the same, if taken as a function of the relevant variable that is the local 
field h (12). 

We want to come back to fig. 10(d) where the Metropolis algorithm Me is compared to 
the FLA for various lattices as function of the temperature. One notices that the FLA is 
the most efficient algorithm for all lattices tested. This result is completely general. 

E. Heisenberg spins, N = 3 

For Heisenberg spins as three component vectors with unit norm the distribution function 
is according to (14) and (16) 

P(6, 0) • sin 6d6-d4 = e h - cos 9 ~ h • sin 9 d6 ■ dxj> 

= e h - x ~ h ■ dx ■ d<P (33) 

with varying from — n to n, 9 from to n, and x = cos9 from —1 to 1. P is of the 
same form as (30) for the continuous Ising spin —\<S<\ multiplied by an additional d(f>. 
Therefore all algorithms used for this last model can also be applied to the Heisenberg or 
0(3) model. In addition we test two other algorithms. The first is the restricted Metropolis 
algorithm Me ,5 where x actual — S< x new < x actual + 5 and the second one the direct heat bath 
DHB 2 where instead of formula (19) for an update (20) is used. 

For testing the algorithms an anti-ferromagnet on a the three-dimensional stacked tri- 
angular lattice (3t) is selected. In fig. 12(a) the probability P(h = 5) is shown together 
with the "steps" the FLA procedure is using. The Metropolis function f Me = 1 is also dis- 
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played. In fig. 12(b) the consumption time for the FLA for different temperatures is plotted 
as function of the "error" . From the figure the best choice is where the Boltzmann weight 
is approximated by an error of 15% which corresponds to n = 55 bins. 

It is interesting to look also at the acceptance rate shown in fig. 12(c). For the DHB the 
acceptance rate is 1, but not so for the DHB 2 variant because of the rejection of value less 
than -1 using (20). 

Fig. 12(d) shows the consumption time for the algorithms. As can be seen the best 
algorithm is the "new" heat bath DHB 2 . It is 25% more efficient than the usual heat bath 
DHB and three times better than the Metropolis algorithms at the critical temperature 
shown by the squares. 



F. Heisenberg </> 4 model, N = 3 

In this section we study Heisenberg spins with a norm y varying between and oo. A 
potential depending on y is added so that the Boltzmann probability of (33) is changed to 

P (6, 0, y) -dy -sin 6 d9-d(f) = e h-v<x*e-v 2 -Hv 2 -i) 2 . dy ■ sin 9 d6 ■ d<f) 

= e h-yx~y 2 -Ky 2 -i) 2 . dy ■ dx ■ d<f) . (34) 

We plotted this probability for h = 2 and A = 1.3182 in fig. 13(a). To study this model for 
its own sake is an interesting task. Here it serves as an example how to handle at the same 
time two variables in the probability distribution. We tested four algorithms. 

First we apply two single variable algorithms consecutively for the variable x and then for 
the variable y. We try two restricted Metropolis procedures Me 2 $ and also two fast linear 
algorithms FLA 2 . 

In updating both variables x and y in a single Monte Carlo step we use as a first method 
a restricted Metropolis algorithm where x actual — 5 X < x new < x actual + 5 X and y actual — 5 y < 
ynew < yactual _|_ ^ The consumption time is minimized by adjusting 8 X and S y to the 
temperature changes. The second method is the Alias Walker Hasting algorithm AWH. 
We did not use the FLA algorithm since it became too difficult to be implemented for two 
variables. In any case the array would be almost as large as for the AWH algorithm since 
both coordinates x and y for each box must be stored. 

In fig. 13(b) we show the consumption time for simulating a ferro magnet on a three- 
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dimensional cubic lattice for different temperatures using AWH algorithm. The number of 
intervals n y for the y variable is varied for n x = 10 to find the best value n y . Similarly a 
graph could be shown for dependence of the time on the number of bins n x for the x variable. 
We found that the best choice is n x = 10 and n y = 20. This corresponds to n = n x -n y = 200 
bins. 

In fig. 13(c) and (d) the acceptance rate and the consumption time are shown for all four 
algorithms. One sees that AWH is the best one but almost equivalent in efficiency to the 
two fast linear algorithms used consecutively FLA 2 . The Metropolis algorithms Me$ and 
Me 2< 5 are far less efficient. 

In conclusion it seems not worth to consider both variables x and y at the same time 
and it is better to update them consecutively. However, this conclusion may hold only if the 
probability has a form similar to the one in fig. 13(a). The situation will be different if, for 
example, there exists two or more peaks in the probability distribution connected by regions 
of low probability. In this case the AWH algorithm should become much more efficient. 

G. 0(4) spins 

The 0(4) spin is a four components vector with norm unity. The probability distribution 
using (14) and (17) is similar to the one appearing in SU(2) gauge theory [28, 29] and has 
the form 

P(6 2 ,9 1 ,(f)) ■ sm 2 6 2 d0 2 -sm0 1 d0 1 -d(i> 

= e h - cosd2 ~ h ■ sin 2 6 2 d6 2 • sin^ d6 1 ■ d<j> 

= e h - X2 ~ h ■ ^/\-x\ dx 2 ■ dx! ■ d(j> , (35) 

with varying from — n to ir, 6i from to n, x 2 = cos 6 2 and X\ = cos 6i from —1 to 1. This 
probability is not integrable in a simple way and therefore no direct heat bath DHB can be 
used. 

Three types of Metropolis algorithms are compared. The first is the standard one for 
N > 4. The four components are taken randomly using Gaussian random numbers and 
then are normalized to get a unit vector. We denote this method by Mec in the figure. 
The second one, denoted by Me, we introduce here, will be faster than the Me G method for 
not so large iV and equivalent in performance to Me G for iV — > oo. We choose randomly 0, 
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x\ and %2 and accept the last value with a probability \f \ — x\ < 1 . 

The last one is the restricted Metropolis Mej. We follow the second Me procedure but 
restrict the choice of the new value by x 2 lctual — 5 < x 2 ew < x 2 ctual + 5. 

We found three other algorithms in the literature, mainly used in the context of the 
577(2) gauge theory. The first one is the Creutz algorithm Cr [28]. It fits P of (35) with an 
exponential like the Moriarty algorithm in the XY case (see fig. 14(a)). 
The second one is due to Fabricius et al. [30], which we call Fa. With a change of variables, 
they fit the probability P depending on x 2 of (35) by the function ^1 — x 2 ■ e h ' X2 ~ h , but the 
price to pay is a more complicated algorithm. 

The third Ke is similar to the last one, but is slightly modified to gain in speed. It is due to 
Kennedy and Pendleton [31]. 

We have implemented also the FLA procedure. We calculate x 2 and determine and X\ 
randomly. In the final stage we use these values to create a new spin around the local field 
h. All the implementations of the methods introduced in this section are accessible at our 
homepage [2]. 

The results of the simulation are displayed in fig. 14 for the three-dimensional anti- 
ferromagnet on a stacked triangular lattice (3t). Fig. 14(a) we show the probability and 
our function fFLA(x 2 ) with n = 20 bins and h = 2. P(x 2 ) reaches its maximum at 
x™ ax = (y/l + 4 h 2 — l)/2h which is useful for the application of formula (21). In the 
same figure the function of Creutz and the Metropolis probability are also shown. 
The fig. 14(b) shows that an error of 15% corresponding to n = 55 bins is the best choice for 
the FLA. The acceptance rate is shown in fig. 14(c) and in fig. 14(d) the consumption time 
of the various algorithms. Again the FLA algorithm is the most efficient algorithm. At the 
critical temperature shown by the squares the gain is a factor 5 compared to the Me or Me$ 
and reaches even 9 in comparison to the standard Mcq- 

For the SU(2) gauge theory we do not have to update and x 1: therefore the situation 
is even more in favor for a simulation with the FLA. In the interesting region h rs 1/16 [31], 
the gain is of 30% compared to Kennedy's algorithm Ke, 80% to Fabricius's algorithm Fa, 
and 120% to the algorithm of Creutz Cr. 

Since the algorithms are local, the results are valid for different lattices and sizes. 
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H. O(N) spins 



The 0(N) spins, that is vectors of unit norm in N dimensions, do not have an exper- 
imental realization, but can be helpful for a better understanding of the nature of phase 
transitions [32, 33]. The probability can be written as 

P(6 N , •••,#1,0) • sm N ~ 2 e N ^ 2 d6 N _ 2 • • • sin 6 l d9 l ■ d<\> 

= e h - cosdN - 2 ~ h ■ an N - 2 6 N - 2 dB N - 2 • • • sin 6 1 dQ x ■ d<f) 

= e h - XN - 2 - h -(l- x 2 N _ 2 )^ dx N _ 2 ■■■dx 1 -d<P (36) 

with varying from — ir to 7r, &i from to 7r, Xi = cos(#j) from —1 to 1. 

The probability for O(N) spins is not amenable to the direct heat bath DHB technique. 
This problem is similar to the 0(4) case with the probability given by (35). In addition for 
the 0(N) case the variables {xns, ■ ■ ■ , ^2} can also not be dealt with in a simple way. See 
fig. 15(a-c) where for N = 6 the probabilities are shown for X4, X3 and X2 ■ 

We have used the same Metropolis algorithms as in the previous section, that is the 
simple Metropolis Me, the restricted Metropolis Me$, and the standard one Mec- For 
the two first algorithms Me and Me$, the {xat_ 3 ,-- - ,x 2 } variables are determined using 
a rejection method: a random number x n between -1 and 1 is accepted as x™ ew with a 
probability (1 — x^)^. The refusal rate is proportional to the area between the curve 
"sinus" in fig. 15(b,c) and the curve (1 — x 2 ^)^ . The xn_ 2 is calculated randomly between 
-1 an 1 (or between x actual ± 5) and accepted with Hasting formula (4) with / = 1. 

In analogy to Moriarty's algorithm for XY spin, to the heat bath for Heisenberg spin, or 
to the algorithm of Creutz for 0(4) spins, we introduce an exponential algorithm Ex for the 
xn-2 variable. We calculate xn-2 using xn-2 = — 1 + \ log[l + ran • (e 2h — 1)] and accept 
it with a probability (1 — 2^-2)"^- The others {x n } are calculated in a similar way as for 
Me. 

We want to show that it is profitable to use the FLA. For N = 6 the best choice cor- 
responds to 15% of error and n = 55 bins for x^ as shown in fig. 15(d). For £3 and X2 the 
FLA is also used with 200 bins (see fig. 15(b) and (c) with 10 and 20 bins, respectively). 
The probability P(x N _ 2 ) reaches its maximum at x™ ^ = (a/ (N — 3) 2 + Ah 2 — (iV — 3)) /2h 
which is helpful for constructing the step function / for the FLA with eq.(21). 

In fig. 16(a) and (b) we put the essential results for N — 6, obtained for an anti- 
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ferromagnet on a three-dimensional stacked triangular lattice (3t)[32, 33]. In fig. 16(a) the 
acceptance rate for the different algorithms is shown. We want to point to the differences 
between Hasting's method (e.g. Metropolis) and the heat bath methods (e.g. FLA and Ex). 
In the first case, if the new spin is not accepted, all values x n and the angle are discarded. 
In the other case for the heat bath method one always gets a new state. Therefore the last 
algorithm becomes much more efficient if the number of values x n to be calculated increases. 
In fig. 16(b) one can compare the consumption time for the different algorithms. One sees, 
that FLA is the most efficient of all the algorithms tested. 

Fig. 16(c) and (d) contains a summary of results for different spin dimension 1 < N < 6, 
that is the acceptance rate and the consumption time at the critical temperature for an anti- 
ferromagnet on the 3t-lattice. The algorithms compared are the fast linear algorithm FLA, 
the Metropolis algorithm Me, the restricted Metropolis algorithm Mes and the Gaussian 
Metropolis algorithm Me G . One observes that the FLA is more efficient than any Metropolis 
algorithms and that the advantage is larger for an increasing number of spin components 
N. This conclusion is totally general and does not depend on the system size nor on the 
lattice type. 

I. Other types of spin Hamiltonians 

Until now we only have dealt with Hamiltonians which can be written in the form given 
by (11) and (12). In fact the Fast Linear Algorithm FLA can only be used for this kind of 
Hamiltonian. However, the Alias Walker Hasting procedure is more flexible. 

For example, for a Hamiltonian H = — X) ■ Sj) 3 [34] the ground state is a conven- 

tional ferromagnetic one and at finite temperatures the spins will still be oriented around 
the direction of all their neighboring spins. However, one cannot define a local field h as in 
(12) for this cubic exchange. This means that the FLA algorithm or any other heat bath 
method cannot work. For these methods one needs a function f > P and since there is no 
formula of P(h) definable, one has only the inefficient choice of a constant / equal to the 
maximum of P. However, Hasting's method is not limited by the condition / > P . Among 
these methods the AWH algorithm with a choice for /, for example similar to the case with 
linear interactions we discussed, should offer a possibility for an efficient simulation. 



33 



V. CONCLUSION 



In this article we have tried to review as completely as possible the available canonical 
local algorithms for discrete and continuous spin systems. Starting from the difference be- 
tween Hasting's and heat bath methods we analyzed the algorithm regarding their numerical 
efficiency In all cases except the standard Ising case, we could demonstrate that by mod- 
ifying known algorithms or to construct new ones the efficiency increases compared to the 
standard ones, in particular compared to the widely known and used Metropolis algorithm. 

For discrete spins (Ising model, Blume-Capel model, Potts model and clock model) the 
best algorithm is a restricted heat bath Walker Hasting algorithm where the new state is 
chosen by the heat bath procedure among all accessible states excluding the actual one. Then 
Hasting's method is used to accept or reject the new state. To accelerate the simulation we 
have proposed Walker's alias method, usually not used in this context. 

For continuous spins (continuous Ising spin, XY spins, Heisenberg spin and 0(N > 4) 
spins) we introduced different Walker Hasting's methods and the Fast Linear Algorithm. 
These methods are efficient at all temperatures and achieve a gain at the critical temperature 
of usually more than two compared to the Metropolis algorithm. For low temperature the 
gain becomes even larger. 

We use arrays heavily to store data calculated prior to the simulation. Therefore the 
efficiency of the introduced algorithms depend strongly on the amount of fast cache memory 
of the processor. Since this amount will certainly increase in the future, the algorithms 
proposed should become even more efficient than quoted in this article. 

We tested the algorithms on a number of lattices varying the sizes but the conclusion 
we reach are valid whatever the lattices and sizes are since local algorithms are compared. 
Moreover we think that these improvements of the algorithms applied to spin systems are 
of general nature and could be applied also outside of this field. 
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FIG. 1: Ising model. Results for a ferromagnet on a square lattice. 
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FIG. 2: Blume-Capel model on a square lattice, see text for explanations. 
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FIG. 3: Three state Potts model. Results for a ferromagnet on a square lattice. 
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FIG. 4: q state Potts model. Results for a ferromagnet on square lattice (2s) 
and for an anti-ferromagnet on cubic lattice (3ca). L is the system size. 
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FIG. 5: q state clock model: (a) model, (b) probability and Walker's "boxes", 
(c) and (d) show results for a ferromagnet on a square lattice (2s). 
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FIG. 6: q = 6 state clock model. Results for a ferromagnet on a square lattice. 
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FIG. 7: Continuous Ising spin — 1 < S < 1: (a) & (c) probability P and 
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FIG. 8: Continuous Ising spin — 1 < S < 1. Results for an anti-ferromagnet 
on a 3d stacked triangular lattice (3t), see text for explanations. 
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FIG. 9: Ising </> 4 model, (a) probability P and function Jfla, (b-d) results 
for a ferromagnet on a cubic lattice (3c) discussed in the text. 




FIG. 10: XY spins, (a) & (c) probability P and functions /, consumption time 
(b) for FLA and (d) for Metropolis compared to FLA for different lattices. 
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FIG. 11: XY spins, (a) & (b) results for an anti-ferromagnet on a 2d triangu- 
lar lattice (2t), (c) consumption time together with over-relaxation steps OR, 
and (d) for a 4d spin glass (4sg) in combination with the exchange algorithm. 




FIG. 12: Results for a Heisenberg anti-ferromagnet on a 3d triangular lattice. 
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FIG. 13: Heisenberg r/> 4 model: (a) the probability P depending on x = cos( 
and the norm y, (b-d) results for a ferromagnet on a cubic lattice. 
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FIG. 14: Four components spins 0(4). Results for the 3d anti-ferromagnetic on a triangular lattice. 



function function 




-1 x 3 l o 0.4 error 

FIG. 15: Six component spins 0(6): (a) (b) and (c) probabilities and the 
corresponding step functions /. (d) Search for a minimal consumption time. 




FIG. 16: N components spins. Results for an anti-ferromagnet on a 3d trian- 
gular lattice, (a) & (b) for N = 6. (c) & (d) for different N at T c . 



